Informatics in Medicine Unlocked — Latest Matching Preprints

1

How can AI be compatible with evidence-based medicine?: with an example of analysis of lung cancer recurrence

Usuzaki, T.; Matsunbo, E.; Inamori, R.

2026-04-25 radiology and imaging 10.64898/2026.04.17.26351114 medRxiv

Top 0.1%

8.6%

Show abstract

Despite the remarkable progress of artificial intelligence represented by large language models, how AI technologies can contribute to the construction of evidence in evidence-based medicine (EBM) remains an overlooked issue. Now, we need an AI that can be compatible with EBM. In the present paper, we aim to propose an example analysis that may contribute to this approach using variable Vision Transformer.

2

Development and validation of neurological health score using machine learning algorithms

Pemmasani, S. K.; Athmakuri, S.; R G, S.; Acharya, A.

2026-02-12 health informatics 10.64898/2026.02.11.26346101 medRxiv

Top 0.1%

5.0%

Show abstract

Neurological health score (NHS), indicating the health of brain and nervous system, helps in identifying high risk individuals, and in recommending lifestyle modifications. In the present study, we developed NHS based on genetic, lifestyle and biochemical variables associated with eight neurological disorders - dementia, stroke, Parkinsons disease, amyotrophic lateral sclerosis, schizophrenia, bipolar disorder, multiple sclerosis and migraine. UK Biobank data from Caucasian individuals was used to develop the model, and the data from individuals of Indian ethnicity was used to validate the model. Logistic regression and XGBoost algorithms were used in selecting the significant variables for the disorders. NHS developed from the selected variables was found to be very significant after adjusting for age and sex (AUC:0.6, OR: 0.95). Higher NHS was associated with a lower risk of neurological disorders and better social well-being. Highest NHS group (top 25%) showed 1.3 times lower risk compared to the rest of the individuals. Results of our study help in developing a framework for quantifying the neurological health in clinical setting.

3

Improving Medicare Fraud Detection Accuracy in Deep Learning by Exploring Feature Selection and Data Sampling Techniques.

Ahammed, F.

2026-03-20 health informatics 10.64898/2026.03.18.26348763 medRxiv

Top 0.1%

4.1%

Show abstract

Fraud in the health landscape is an aggravating issue, with far-reaching consequences burdening the financial stability of the health industry and threatening the quality of medical care. It results from vulnerabilities within the current healthcare framework that are exploited by the fraudsters in their favor. In spite of many developed models that aim to detect fraudulent patterns in insurance claims, the accuracy of such models frequently suffers as a result of the imbalance issue of the Medicare dataset and irrelevant features. This study ventures to improve detection performance and accuracy by employing a deep learning model along with data sampling and feature selection techniques. Comparative analysis among different combinations is conducted to determine their efficacy to enhance the accuracy of the fraud detection model. Hence, the suggested model clearly demonstrates that a combination of myriad data sampling and feature selection techniques is helping to improve accuracy and performance. The accuracy was thus 95.4%, with negligible evidence of overfitting detected using both Chi-square and Synthetic Minority Over-sampling (SMOTE) techniques. Ultimately, the study findings underscore the significance of employing combined techniques instead of using only the baseline deep learning model for better performance in detecting Medicare insurance fraud.

4

Multi-task deep learning integrating pretreatment MRI and whole slide images predicts induction chemotherapy response and survival in locally advanced nasopharyngeal carcinoma

Hou, J.; Yi, X.; Li, C.; Li, J.; Cao, H.; Lu, Q.; Yu, X.

2026-04-11 radiology and imaging 10.64898/2026.04.07.26350350 medRxiv

Top 0.1%

4.0%

Show abstract

Predicting response to induction chemotherapy (IC) and overall survival (OS) is critical for optimizing treatment in patients with locally advanced nasopharyngeal carcinoma (LANPC). This study aimed to develop and validate a multi-task deep learning model integrating pretreatment MRI and whole slide images (WSIs) to predict IC response and OS in LANPC. Pretreatment MRI and WSIs from 404 patients with LANPC were retrospectively collected to construct a multi-task model (MoEMIL) for the simultaneous prediction of early IC response and OS. MoEMIL employed multi-instance learning to process WSIs, PyRadiomics and a convolutional neural network (ResNet50) to extract MRI features, and fused multimodal features through a multi-gate mixture-of-experts architecture. Clustering-constrained attention multiple instance learning and gradient-weighted class activation mapping were applied for visualization and interpretation. MoEMIL effectively stratified patients into good and poor IC response groups, achieving areas under the curve of 0.917, 0.869, and 0.801 in the train, validation, and test sets, respectively, and outperformed the deep learning radiomics model, the pathomics model and TNM staging. The model also stratified patients into high- and low-risk OS groups (P < 0.05). MoEMIL shows promise as a decision-support tool for early IC response prediction and prognostication in LANPC. Author SummaryWe have developed a deep learning model that integrates two types of medical images, including magnetic resonance imaging (MRI) and digital pathological slices, to simultaneously predict response to induction chemotherapy and prognosis in patients with locally advanced nasopharyngeal carcinoma. Current treatment decisions primarily rely on traditional tumor staging (TNM), which often fails to comprehensively reflect the complexity of the disease. Our model, named MoEMIL, was trained and tested on data from 404 patients across two hospitals and consistently outperformed both single-model approaches and TNM staging methods. By identifying patients who exhibit poor response to induction chemotherapy or higher prognostic risk, our tool can assist clinicians in achieving personalized treatment, enabling intensified management for high-risk patients and avoiding unnecessary side effects for low-risk patients. Additionally, we visualize the models reasoning process through heat map generation, which highlights the image regions exerting the greatest influence on prediction outcomes. This work represents a step toward more precise treatment for nasopharyngeal carcinoma; however, larger-scale prospective studies are required before the model can be integrated into routine clinical practice.

5

Computational Prediction of Plasmodium falciparum Antigen-T-cell Receptor Interactions via Molecular Docking: Implications for Malaria Vaccine Design

Kipkoech, G.; Kanda, W.; Irungu, B.; Nyangi, M.; Kimani, C.; Nyangacha, R.; Keter, L.; Atieno, D.; Gathirwa, J.; Kigondu, E.; Murungi, E.

2026-03-20 bioinformatics 10.64898/2026.03.18.712575 medRxiv

Top 0.2%

3.6%

Show abstract

Malaria is one of the deadliest diseases in sub-Saharan Africa and Southeast Asia. The majority of the fatalities occur mostly in children under 5 years and pregnant women and this is due to infection by Plasmodium spp, of which Plasmodium falciparum is the most virulent and is responsible for most of the morbidity and mortality. Despite various public health interventions such as use of insecticide-treated bed nets, spraying of homes with insecticides and use of WHO recommended artemisinin-based combination therapies (ACT), malaria prevention still faces major setback due to drug and insecticide resistance by P. falciparum and mosquitoes respectively. The study uses molecular docking and immunoinformatics to screen various Plasmodium spp antigens and evaluate their antigenicity and suitability as vaccine candidates. The P. falciparum antigens and T-cell receptor (TCR) structures were obtained from Protein Data Bank (PDB) based on a range of factors related to their role in the lifecycle of the parasite and their status as vaccine targets. Protein structures not available in the PDB were predicted using AlphaFold. The 3D structures of selected P. falciparum antigens and TCR structures were downloaded in PDB format then all water molecules, Hetatm, and bound ligands were deleted from the protein structures using BIOVIA Discovery Studio Visualizer. Subsequently, molecular docking was done using ClusPro v2.0 server and docked complexes were compared. The findings of this study gave valuable insights into the interaction of human immune response with P. falciparum antigens. The best three ranked antigen complexes are PfCyRPA, PfMSP10 and PfCSP and this confirm their use as potential candidates for vaccine development. This study highlights the usefulness of computational docking in identifying P. falciparum antigens of excellent immunogenic potential as vaccine candidates.

6

CardioPulmoNet: Modeling Cardiopulmonary Dynamics for Histopathological Diagnosis

Pham, T. D.

2026-02-20 health informatics 10.64898/2026.02.19.26346620 medRxiv

Top 0.2%

3.6%

Show abstract

ObjectiveThis study investigates whether incorporating physiological coupling concepts into neural network design can support stable and interpretable feature learning for histopathological image classification under limited data conditions. MethodsA physiologically inspired architecture, termed CardioPulmoNet, is introduced to model interacting feature streams analogous to pulmonary ventilation and cardiac perfusion. Local and global tissue features are integrated through bidirectional multi-head attention, while a homeostatic regularization term encourages balanced information exchange between streams. The model was evaluated on three histopathological datasets involving oral squamous cell carcinoma, oral submucous fibrosis, and heart failure. In addition to end-to-end training, learned representations were assessed using linear support vector machines to examine feature separability. ResultsCardioPulmoNet achieved performance comparable to several pretrained convolutional neural networks across the evaluated datasets. When combined with a linear classifier, improved classification performance and higher area under the receiver operating characteristic curve were observed, suggesting that the learned feature embeddings are well structured for downstream discrimination. ConclusionThese results indicate that physiologically motivated architectural constraints may contribute to stable and discriminative representation learning in computational pathology, particularly when training data are limited. The proposed framework provides a step toward integrating physiological modeling principles into medical image analysis and may support future development of transferable and interpretable learning systems for histopathological diagnosis.

7

An Exploratory Study of ResNet and Capsule Neural Networks for Brain Tumor Detection in MRI

Mensah, S.; Atsu, E. K. A.; Ammah, P. N. T.

2026-02-09 radiology and imaging 10.64898/2026.02.05.26345460 medRxiv

Top 0.2%

3.5%

Show abstract

Brain tumors are one of the most life-threatening diseases, requiring precise and timely detection for effective treatment. Traditional methods for brain tumor detection rely heavily on manual analysis of MRI scans, which is time-consuming, subjective, and prone to human error. With advancements in deep learning, Convolutional Neural Networks (CNNs) have become popular for medical image analysis. However, CNNs are limited in their ability to capture spatial hierarchies and pose variations, which reduces their accuracy, particularly for tasks like brain tumor segmentation where precise spatial relationships are crucial. This research introduces a hybrid Capsule Neural Network (CapsNet) and ResNet50 model designed to overcome the limitations of traditional CNNs by capturing both spatial and pose information in MRI scans. The proposed model leverages ResNet50 for feature extraction and CapsNet for handling spatial relationships, leading to more accurate segmentation. The study evaluates the model on the BraTS2020 dataset and compares its performance to state-of-the-art CNN architectures, including U-Net and pure CNN models. The hybrid model, featuring a custom 5-cycle dynamic routing algorithm to enhance capsule agreement for tumor boundaries, achieved 98% accuracy and an F1-score of 0.87, demonstrating superior performance in detecting and segmenting brain tumors. This study pioneers the systematic evaluation of the ResNet50 + CapsNet hybrid on the BraTS2020 dataset, with a tailored class weighting scheme addressing class imbalance, improving effectiveness in identifying irregularly shaped tumors and smaller regions in identifying irregularly shaped tumors and smaller tumor regions. The study offers a robust solution for automating brain tumor detection. Future work will explore the use of Capsule Networks alone for brain tumor detection in MRI data and investigate alternative Capsule Network architectures, as well as their integration into clinical decision support systems.

8

TOXsiRNA: A web server to predict the toxicity of chemically modified siRNAs

Dar, S.; Kumar, M.

2026-02-14 bioinformatics 10.64898/2026.02.12.705521 medRxiv

Top 0.2%

3.1%

Show abstract

Small interfering RNAs (siRNAs) are largely modified with chemical molecules to enhance their properties for use in molecular biology research and therapeutic applications. Toxicity effects may arise due to these chemical moieties as well as sequence based off-targets at cellular level. Enormous resources are required to experimentally design and test the toxicity of these chemical modifications and their combinations on siRNAs. To address this problem, we developed TOXsiRNA web server to computationally predict the toxicity of chemically modified siRNAs and their off-targets. We selected 2749 siRNAs with different permutations and combinations of 21 different chemical modifications engineered on them. Next, we used Support Vector Machine (SVM), Linear Regression (LR), K-Nearest Neighbor (KNN) and Artificial Neural Network (ANN) machine learning applications to develop models. Best performance was displayed by mononucleotide composition-based model developed with SVM, offering Pearson Correlation Coefficient (PCC) of 0.91 and 0.92 on training testing and independent validations respectively. Other sequence features like dinucleotide composition binary pattern and their combinations were also tested. Finally, three models of chemically modified siRNAs were implemented on the web server. Other algorithms that include predicting normal as well as chemically modified siRNA knockdown efficacy, off target etc. are also integrated. The resource is hosted online for scientific use freely at url: http://bioinfo.imtech.res.in/manojk/toxsirna.

9

Cross-Attention Enables Context-Aware Multimodal Skin Lesion Diagnosis

Mridha, K.; Islam, H.

2026-03-11 health informatics 10.64898/2026.03.10.26348046 medRxiv

Top 0.3%

2.4%

Show abstract

Clinical diagnosis of skin lesions integrates visual dermoscopic features with patient context such as age, skin type, and lesion characteristics. However, most artificial intelligence systems for dermoscopic analysis rely solely on image data and ignore structured clinical metadata. We developed a multimodal deep learning framework that combines dermoscopic images with patient metadata and evaluated whether cross-attention mechanisms better capture contextual interactions than conventional fusion strategies. Using 1,568 lesions from the PAD-UFES-20 dataset (69% malignant) with associated metadata (age, sex, Fitzpatrick skin type, anatomical site, and lesion diameter), we compared four models: metadata-only logistic regression, image-only ResNet18, late fusion via feature concatenation, and cross-attention-based fusion. The image-only model achieved strong discrimination (AUC 0.9776), while late fusion slightly reduced performance (AUC 0.9717). The proposed cross-attention model achieved the best overall results (AUC 0.9818, AUPRC 0.9924) with improved calibration (ECE 0.0379). These findings suggest that attention-based multimodal learning enables more effective integration of patient context for automated skin lesion diagnosis.

10

MCA-UNet: A Multi-Scale Context and Attention U-Net for Colorectal Polyp Segmentation

Dong, Y.; Fang, G.; Du, R.; Hu, H.; Fang, Z.; Guo, C.; Lu, R.; Jia, Y.; Tian, Y.; Wang, Z.

2026-03-16 gastroenterology 10.64898/2026.03.11.26348049 medRxiv

Top 0.3%

2.4%

Show abstract

IntroductionTo propose an improved U-Net-based segmentation model for colorectal polyp segmentation, aiming to address the challenges of variable lesion morphology, ambiguous boundaries, complex background interference, and insufficient cross-level feature fusion in endoscopic images [5,12]. MethodsAn improved network termed MCA-UNet was developed based on U-Net [5]. The model incorporates a multi-scale context convolution block (MCCB) to enhance multi-scale feature extraction and an attention-guided feature fusion module (AGFF) to optimize skip-feature selection and fusion in the decoder. Experiments were conducted on publicly available colorectal polyp image datasets, including Kvasir-SEG and CVC-ClinicDB [13-15]. Four models, including U-Net, U-Net+MCCB, U-Net+AGFF, and MCA-UNet, were compared, and all models were trained for 100 epochs. Dice, intersection over union (IoU), and mean absolute error (MAE) were used as the main evaluation metrics [20]. ResultsOn the mixed validation set, the Dice scores of U-Net, U-Net+MCCB, U-Net+AGFF, and MCA-UNet were 0.742, 0.771, 0.754, and 0.783, respectively; the corresponding IoU values were 0.603, 0.635, 0.618, and 0.649; and the MAE values were 0.102, 0.090, 0.097, and 0.086. Compared with the baseline U-Net, MCA-UNet improved Dice and IoU by 5.53% and 7.63%, respectively, while reducing MAE by 15.69%. Comparisons on the Kvasir-SEG and CVC-ClinicDB validation subsets further demonstrated the more stable performance of the proposed model. ConclusionBy jointly integrating multi-scale contextual modeling and attention-guided feature fusion, MCA-UNet effectively improves the accuracy and robustness of colorectal polyp segmentation and may provide useful support for intelligent endoscopic image analysis [12,17,18].

11

The Risk Factors, Detection and Classification of Esophageal Cancer Using Ensemble Machine Learning Models

Gaso, M. S.; Mekuria, R. R.; Cankurt, S.; Deybasso, H. A.; Abdo, A. A.; Abbas, G. H.

2026-03-11 health informatics 10.64898/2026.03.09.26347944 medRxiv

Top 0.3%

2.3%

Show abstract

Esophageal cancer (EC) remains one of the most lethal malignancies worldwide, with poor survival outcomes largely attributable to late-stage diagnosis and limited treatment effectiveness. Early detection and accurate risk stratification are therefore essential for improving clinical management. In this study, we investigate the predictive value of socio-demographic, dietary, behavioral, environmental, and clinical variables collected from 312 individuals (104 EC cases and 208 controls) in the Arsi Zone, Ethiopia. An ensemble features ranking approach based on Random Forest machine learning was first applied to identify the most relevant predictive features. Subsequently, multiple ensemble machine learning models were evaluated, including Histogram-based Gradient Boosting (Model I), Extreme Gradient Boosting (Model II), AdaBoost (Model III), Random Forest (Model IV), and k-Nearest Neighbors (Model V). These models were tested under multiple experimental settings using both full and reduced feature subsets. To enhance robustness and minimize variability, a multi-seed ensemble framework was employed. Different seed values generate distinct train-test splits and slight variations in model initialization and optimization, leading to minor differences in training outcomes; aggregating results across multiple seeds mitigates this variability and provides more stable and reliable performance estimates. The experimental results demonstrate that boosting-based ensemble models consistently outperform other classifiers across all evaluation metrics. Model I achieved the highest overall performance, reaching an accuracy of 0.983, with precision of 0.982, recall of 0.980, and F1-score of 0.981 using the reduced feature set, while maintaining nearly identical performance with the full feature set. Model II also showed stable and strong predictive capability, achieving accuracies of 0.963 and 0.961 for the full and reduced feature sets, respectively, with balanced precision, recall, and F1-score values. These findings indicate that feature importance-based dimensionality reduction preserves essential predictive information without compromising classification performance. Overall, the results highlight the significant predictive contribution of dietary and environmental risk factors and demonstrate that ensemble learning provides a reliable, efficient, and clinically meaningful approach for early EC detection. The proposed framework offers a promising direction for supporting diagnostic decision-making and risk stratification in resource-limited healthcare settings. HighlightsO_LIMachine Learning Framework for Esophageal Cancer Classification A robust ensemble machine learning framework was developed to classify esophageal cancer using socio-demographic, dietary, behavioral, environmental, and clinical risk factors, enabling accurate and reliable disease prediction. C_LIO_LIMulti-Seed Ensemble Strategy for Improved Model Stability A novel multi-seed ensemble classification approach was implemented to reduce model variance and improve robustness by aggregating predictions across multiple randomized training and testing splits. C_LIO_LIEnsemble Feature Ranking for Optimal Feature Selection An ensemble Random Forest-based feature ranking framework was designed to identify the most predictive features, ensuring stable biomarker selection and improved model interpretability. C_LIO_LIHigh Classification Performance with Reduced Feature Set The proposed ensemble HGBC model achieved outstanding performance with 98.3% accuracy, 98.2% precision, 98.0% recall, and 98.1% F1-score using a reduced feature subset, demonstrating efficient dimensionality reduction without performance loss. C_LIO_LIExceptional Discriminative Ability with Near-Perfect AUC The ensemble HGBC model achieved an AUC of 0.994, indicating excellent discrimination between cancer and non-cancer cases and confirming its suitability for high-precision clinical decision support. C_LIO_LIZero False-Negative Predictions and Maximum Diagnostic Sensitivity The proposed model achieved zero false negatives in evaluation, resulting in 100% statistical power and perfect sensitivity, ensuring reliable detection of esophageal cancer cases. C_LIO_LIIdentification of Key Dietary and Environmental Risk Factors Feature importance analysis revealed that dietary habits, hot food consumption, environmental exposures, and behavioral factors are among the most significant predictors of esophageal cancer risk. C_LIO_LIEnsemble Learning Outperforms Traditional Machine Learning Models Boosting-based ensemble models, particularly HGBC and XGBoost, consistently outperformed other classifiers, demonstrating superior predictive accuracy, stability, and robustness. C_LIO_LIEfficient and Interpretable AI Framework for Clinical Decision Support The proposed framework balances high predictive accuracy with interpretability, making it suitable for assisting clinicians in early diagnosis and risk stratification of esophageal cancer. C_LIO_LIAI-Driven Solution for Resource-Constrained Healthcare Settings The proposed ensemble machine learning approach provides an effective and scalable diagnostic support tool, particularly valuable for healthcare systems with limited resources and access to specialized medical expertise. C_LI

12

Automated Segmentation of Head and Neck Cancer from CT Images Using 3D Convolutional Neural Networks

Prabhanjans, P.; Punathil, A. N.; V K, A.; Thomas T, H. M.; Sasidharan, B. K.; Shaikh, H.; Varghese, A. J.; Kuchipudi, R. B.; Pavamani, S.; Rajan, J.

2026-03-13 radiology and imaging 10.64898/2026.03.12.26347996 medRxiv

Top 0.3%

2.1%

Show abstract

Head and neck cancer (HNC) requires accurate tumor delineation for effective radiotherapy planning. Manual segmentation of tumor regions is time-consuming and subject to considerable inter-observer variability. Although several automated approaches have been proposed, many rely on multimodal imaging such as PET/CT, which is expensive, less accessible in many clinical settings, and increases the burden on patients. In this work, we investigate a CT-only three-dimensional segmentation framework that provides a clinically practical and resource-efficient alternative. CT images of 136 head and neck cancer patients from the publicly available HN1 dataset in The Cancer Imaging Archive (TCIA) were used along with 30 additional cases from a private dataset collected at a tertiary care centre, Christian Medical College (CMC), Vellore, India. A fully automated segmentation model was developed to delineate the primary gross tumor volume (GTV) using the 3D nnU-Net framework. The models were trained using the HN1 dataset and an extended HN1+CMC dataset that included the additional private cases. Performance was evaluated using three-fold cross-validation with standard segmentation metrics including Dice Similarity Coefficient (DSC), Intersection over Union (IoU), and the 95th percentile Hausdorff Distance (HD95). The proposed CT-based model achieved a Global Dice of 0.63 and a Median Dice of 0.60 on the HN1 dataset. When the additional CMC cases were incorporated during training, the performance improved to a Global Dice of 0.65 and a Median Dice of 0.71. These results demonstrate that 3D nnU-Net can effectively segment head and neck tumors from CT images alone. The proposed CT-only approach provides a cost-effective and scalable solution that can support radiotherapy treatment planning and help reduce variability in clinical workflows.

13

Drug Repurposing: A Potential Therapeutic Strategy for the Treatment of Chikugunya Virus

Zondi, S.; Mtambo, S.; Buthelezi, N.; Shunmugam, L.; Magwenyane, A.; Kumalo, H. M.

2026-02-19 bioinformatics 10.64898/2026.02.19.706773 medRxiv

Top 0.3%

2.0%

Show abstract

Chikungunya virus (CHIKV) infection is one of the major public health concerns in several countries around the world. CHIKV non-structural protein 2 (nsP2) is a promising drug design target due to the enzymes multifunctional properties that facilities viral replication and propagation. To date, there is an evident lack of preventative and therapeutic developments that can be used against CHIKV. Drug repurposing is a time saving and cost-effective method used for the development of new drugs. In this study, drug repurposing was implemented with the use of HIV/HCV protease inhibitors to inhibit the active site of nsP2. Molecular dynamics simulations and analysis revealed the stability of two drugs, Indinavir and Paritaprevir. Indinavir forms a hydrogen bond with a major residue, which closes the flexible loop, situated in close proximity to the active site. This conformational shift in the orientation of the enzyme prevents accessibility to the active site thus disrupting the nsP2 protein from functioning effectively in viral replication. In conclusion, the findings of this study identified Indinavir was identified as a promising CHIKV nsP2 inhibitor. This study will provide the basis to further facilitate the drug repurposing strategy as an alternative approach for drug design of CHIKV inhibitors as well as other viral families.

14

When clinical prediction models do not generalize: a simulation study in liver transplantation

Brulhart, D.; Magini, G.; Schafer, A.; Schwab, S.; Held, U.

2026-03-20 health informatics 10.64898/2026.03.19.26348780 medRxiv

Top 0.3%

1.9%

Show abstract

Objectives: Clinical prediction models estimate the risk of a future outcome in patients. Such models are often externally validated using independent datasets; however, even when a model has been rigorously validated in a new setting and patient population, its performance across other clinical settings remains unclear. Therefore, we systematically evaluated model performance and clinical utility across diverse patient populations to quantify the limits of transportability. Methods: Using liver transplantation as an example, we used the UK donation-after-circulatory-death (DCD) risk score and descriptive statistics from Swiss DCD liver transplant populations to simulate realistic target populations with varying donor and recipient characteristics. The risk score's ability to predict one-year graft failure was evaluated using calibration intercept, calibration slope, area under the receiver operating characteristic (ROC) curve, and net benefit. Results: The UK DCD Risk Score's performance depended heavily on the simulated population characteristics. While the score performed adequately in settings similar to those where it was derived, it was not satisfactory in others. Discussion: The study showed, using a risk score in liver transplantation as an example, that the application of a prediction model can be limited in certain external populations when they differ, and that its transportability in new settings is not guaranteed. Conclusion: This study highlights the importance of external validation of clinical prediction models to determine transportability to various target populations. Their application requires careful consideration and potential model re-estimation.

15

Pneumonia Detection in Paediatric Chest X-Rays using Ensembled Large Language Models

Tan, J.; Tang, P. H.

2026-04-12 radiology and imaging 10.64898/2026.04.10.26347909 medRxiv

Top 0.3%

1.9%

Show abstract

Background: Paediatric pneumonia is a leading cause of childhood morbidity and mortality worldwide. Chest X-rays (CXR) are an important diagnostic tool in the diagnosis of pneumonia, but shortages in specialist radiology services lead to clinically significant delays in CXR reporting. The ability to communicate findings both to clinicians and laypersons allows MLLMs to be deployed throughout clinical workflows, from image analysis to patient communication. However, MLLMs currently underperform state-of-the-art deep learning classifiers. Objective: To evaluate the diagnostic accuracy of ensemble strategies with MLLMs compared to the baseline average agent for paediatric radiological pneumonia detection. Methods: We conducted a retrospective cohort study using paediatric CXRs from two independent hospital datasets totalling 2300 CXRs. Fifteen MedGemma-4B-it agents independently classified each CXR into five pneumonia likelihood categories. Majority voting, soft voting, and GPTOSS-20B aggregation were compared against the average agent performance. The primary metric evaluated was OvR AUROC. Secondary metrics included accuracy, sensitivity, specificity, F1-score, Cohen's kappa, and OvO AUROC. Results: Soft voting achieved improvements in OvR AUROC (p_balanced = 0.0002, p_real-world = 0.0003), accuracy (p_balanced = 0.0008, p_real-world < 0.0001), Cohen's Kappa (p_balanced = 0.0006, p_real-world = 0.0054) and OvO AUROC (p_balanced < 0.0001, p_real-world = 0.0011) across both datasets, and a superior F1-value (pbalanced = 0.0028) for the balanced dataset. Conclusion: Soft voting enhances MedGemma's diagnostic discriminatory performance for paediatric radiological pneumonia detection. Our system enables privacy-preserving, near real-time clinical decision support with explainable outputs, having potential for integration into emergency departments. Our system's high specificity supports triage by flagging high-risk radiological pneumonia cases.

16

Thyroid Cancer Risk Prediction from Multimodal Datasets Using Large Language Model

Ray, P.

2026-03-06 health informatics 10.64898/2026.03.05.26347766 medRxiv

Top 0.4%

1.9%

Show abstract

Thyroid carcinoma is one of the most prevalent endocrine malignancies worldwide, and accurate preoperative differentiation between benign and malignant thyroid nodules remains clinically challenging. Diagnostic methods that medical practitioners use at present depend on their personal judgment to evaluate both imaging results and separate clinical tests, which creates inconsistency that leads to incorrect medical evaluations. The combination of radiological imaging with clinical information systems enables healthcare providers to enhance their capacity to make reliable predictions about patient outcomes while improving their decision-making abilities. The study introduces a deep learning framework that utilizes multiple data sources by combining magnetic resonance imaging (MRI) data with clinical text to predict thyroid cancer. The system uses a Vision Transformer (ViT) to obtain advanced MRI scan features, while a domain-adapted language model processes clinical documents that contain patient medical history and symptoms and laboratory results. The cross-modal attention system enables the system to merge imaging data with textual information from different sources, which helps to identify how the two types of data are interconnected. The system uses a classification layer to classify the fused features, which allows it to determine the probability of cancerous tumors. The experimental results show that the proposed multimodal system achieves better results than the unimodal base systems because it has higher accuracy, sensitivity, specificity, and AUC values, which help medical personnel to make better preoperative decisions.

17

Adversarial Robustness of Capsule Networks for Medical Image Classification

Srinivasan, A.; Sritharan, D. V.; Chadha, S.; Fu, D.; Hossain, J. O.; Breuer, G. A.; Aneja, S.

2026-03-10 health informatics 10.64898/2026.03.09.26347900 medRxiv

Top 0.4%

1.9%

Show abstract

PurposeDeep learning models are increasingly being used in medical diagnostics, but their vulnerability to adversarial perturbations raises concerns about their reliability in clinical applications. Capsule networks (CapsNets) are a promising architecture for medical imaging tasks, given their ability to model spatial relationships and train with smaller amounts of data. Although previous studies have focused on adversarial training approaches to improve robustness, exploring alternative architectures is an underexplored direction for combating poor adversarial stability. Prior work has suggested that CapsNets may exhibit improved robustness to adversarial perturbations compared to convolutional neural networks (CNNs), but performance on adversarial images has not been studied systematically in clinical environments. We evaluated the robustness of CapsNets compared to CNNs and vision transformers (ViTs) across multiple medical image classification tasks. MethodsWe trained two CNNs (ResNet-18 and ResNet-50), one ViT (MedViT), and two CapsNets (DR-CapsNet and BP-CapsNet) on four distinct medical imaging datasets (PneumoniaMNIST, BreastMNIST, NoduleMNIST3D, and BloodMNIST) and one natural image dataset (MNIST). Models were evaluated on adversarial examples generated by projected gradient descent and fast gradient sign method across a range of perturbation bounds. Interpretability experiments, including latent space and Gradient-weighted Class Activation Mapping (Grad-CAM) analyses, were conducted to better understand model stability on adversarial inputs. ResultsCapsNets demonstrated superior robustness under adversarial perturbations compared to CNNs and ViTs across all medical imaging datasets and the natural image dataset. Latent space and Grad-CAM visualizations revealed that CapsNets maintained more consistent embedding representations and attention maps after adversarial perturbations compared to CNNs and ViTs, suggesting that advantages in CapsNet robustness are supported, at least in part, by more stable feature encodings. Bayes-Pearson routing further improved robustness over standard dynamic routing in CapsNets without compromising baseline performance, suggesting a potential architectural improvement. ConclusionCapsNets exhibit intrinsic advantages in adversarial robustness over CNN- and ViT-based models on medical imaging tasks, suggesting they are a reliable alternative for medical image classification. These findings support the use of CapsNets in clinical applications where model reliability is critical.

18

Preoperative CT-Based Habitat Radiomics Classifiers Predict Recurrence in Non-Small Cell Lung Cancer

Altinok, O.; Ho, W. L. J.; Robinson, L.; Goldgof, D.; Hall, L. O.; Guvenis, A.; Schabath, M. B.

2026-04-16 radiology and imaging 10.64898/2026.04.14.26350899 medRxiv

Top 0.4%

1.8%

Show abstract

Objectives: Among surgically resected non-small cell lung cancer (NSCLC) patients with similar stage and histopathological characteristics, there is variability in patient outcomes which highlights urgency of identifying biomarkers to predict recurrence. The goal of this study was to systematically develop a pre-surgical CT-based habitat-based radiomics classifier to predict recurrence-of-risk in NSCLC. Methods: This study included 293 NSCLC patients with surgically resected stage IA-IIIA disease that were randomly divided into a training (n = 195) and test cohorts (n = 98). From pre-surgical CT images, tumor habitats were generated using two-level unsupervised clustering and then radiomic features were calculated from the intratumoral region and habitat-defined subregions. Using ridge-regularized logistic regression, separate classifiers were developed to predict 3-year recurrence using intratumoral radiomics, habitat-based radiomics, and a combined model (intratumoral and habitat) which was generated using a stacked learning framework. For each classifier, probability of recurrence was calculated for each patient then numerous statistical and machine learning approaches were utilized to stratify patients for recurrence-free survival. Results: The combined radiomics classifier yielded a superior AUC (0.82) compared to the intratumoral (AUC = 0.75) and habitat radiomics (AUC = 0.81) models. When the classifiers were used to stratify high- versus low-risk patients utilizing a cut-point identified by decision tree analysis, high-risk patients were yielded the largest risk estimate (HR = 8.43; 95% CI 2.47 - 28.81) compared to the habitat (HR = 5.41; 95% CI 2.08 - 14.09) and intratumoral radiomics (HR = 3.54; 95% CI 1.45 - 8.66) models. SHAP analyses indicated that habitat-derived information contributed most strongly to recurrence prediction. Conclusions: This study revealed that habitat-based radiomics provided superior statistical performance than intratumoral radiomics for predicting recurrence in NSCLC.

19

Predicting Rectal Cancer Patient Survival with Dutch Radiology Reports using Natural Language Processing (NLP): The Role of Pretrained Language Models

Cai, L.; Zhang, T.; Beets-Tan, R.; Brunekreef, J.; Teuwen, J.

2026-01-30 health informatics 10.64898/2026.01.23.26344428 medRxiv

Top 0.4%

1.8%

Show abstract

The use of Electronic Health Records (EHRs) has increased significantly in recent years. However, a substantial portion of the clinical data remains in unstructured text formats, especially in the context of radiology. This limits the application of EHRs for automated analysis in oncology research. Pretrained language models have been utilized to extract feature embeddings from these reports for downstream clinical applications, such as treatment response and survival prediction. However, a thorough investigation into which pretrained models produce the most effective features for rectal cancer survival prediction has not yet been done. This study explores the performance of five Dutch pretrained language models, including two publicly available models (RobBERT and MedRoBERTa.nl) and three developed in-house for the purpose of this study (RecRoBERT, BRecRoBERT, and BRec2RoBERT) with training on distinct Dutch-only corpora, in predicting overall survival and disease-free survival outcomes in rectal cancer patients. Our results showed that our in-house developed BRecRoBERT, a RoBERTa-based language model trained from scratch on a combination of Dutch breast and rectal cancer corpora, delivered the best predictive performance for both survival tasks, achieving a C-index of 0.65 (0.57, 0.73) for overall survival and 0.71 (0.64, 0.78) for disease-free survival. It outperformed models trained on general Dutch corpora (RobBERT) or Dutch hospital clinical notes (MedRoBERTa.nl). BRecRoBERT demonstrated the potential capability to predict survival in rectal cancer patients using Dutch radiology reports at diagnosis. This study highlights the value of pretrained language models that incorporate domain-specific knowledge for downstream clinical applications. Furthermore, it proves that utilizing data from related domains can improve the quality of feature embeddings for certain clinical tasks, particularly in situations where domain-specific data is scarce.

20

External validation of self-supervised transfer learning for noninvasive molecular subtyping of pediatric low-grade glioma using T2-weighted MRI

Yoo, J. J.; Tak, D.; Namdar, K.; Wagner, M. W.; Liu, A.; Tabori, U.; Hawkins, C.; Ertl-Wagner, B. B.; Kann, B. H.; Khalvati, F.

2026-01-30 radiology and imaging 10.64898/2026.01.27.26344883 medRxiv

Top 0.4%

1.8%

Show abstract

PurposeTo externally evaluate three binary classification models designed to differentiate the molecular subtype of pediatric low-grade glioma (pLGG) between BRAF Fusion, BRAF Mutation, and Wild Type on T2-weighted magnetic resonance imaging using self-supervised transfer learning, which enables effective performance in a low data setting. Materials and methodsThis retrospective study evaluates pLGG molecular subtyping models, pre-trained using data collected at Dana Farber Cancer Institute/Bostons Childrens Hospital, on two datasets from the Hospital for Sick Children, one consisting of patients identified from the electronic health record between January 2000 to December 2018 (n=336) and another consisting of patients identified from the electronic health record between January 2019 to April 2023 (n=87). These datasets consist of T2-weighted MRI with pLGG and corresponding genetic marker identifications, labelled as BRAF Fusion, BRAF Mutation, or Wild Type. The datasets included manually annotated ground-truth segmentations that were used in the classification pipeline during evaluation. The models were evaluated using the area under the receiver operating characteristic curve (AUC). To acquire a per-class probabilities across all three considered molecular subtypes, we used the output probabilities from each binary model as logits input to a Softmax function. These probabilities were used to determine the AUC of the models on each evaluated dataset. ResultsThe models performed achieved a macro-average AUC of 0.7671 on the newer dataset from the Hospital for Sick Children but achieved a lower macro-average AUC of 0.6463 on the older dataset from the Hospital for Sick Children. ConclusionsThe evaluated pLGG molecular subtyping models have the potential for effective generalization but may require further fine-tuning for consistent performance across varying datasets.